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Abstract 



Birth-death processes (BDPs) are continuous-time Markov chains that track the number of "par- 
ticles" in a system over time. While widely used in population biology, genetics and ecology, 
statistical inference of the instantaneous particle birth and death rates remains largely limited 
to restrictive linear BDPs in which per-particle birth and death rates are constant. Researchers 
often observe the number of particles at discrete times, necessitating data augmentation pro- 
cedures such as expectation-maximization (EM) to find maximum likelihood estimates. The 
E-step in the EM algorithm is available in closed-form for some linear BDPs, but otherwise 
previous work has resorted to approximation or simulation. Remarkably, the E-step conditional 
expectations can also be expressed as convolutions of computable transition probabilities for any 
general BDP with arbitrary rates. This important observation, along with a convenient con- 
tinued fraction representation of the Laplace transforms of the transition probabilities, allows 
novel and efficient computation of the conditional expectations for all BDPs, eliminating the 
need for approximation or costly simulation. We use this insight to derive EM algorithms that 
yield maximum likelihood estimation for general BDPs characterized by various rate models, in- 
cluding generalized linear models. We show that our Laplace convolution technique outperforms 
competing methods when available and demonstrate a technique to accelerate EM algorithm 
convergence. Finally, we validate our approach using synthetic data and then apply our methods 
to estimation of mutation parameters in microsatellite evolution. 



Keywords: Birth-death process, EM algorithm, MM algorithm, maximum likelihood estimation, 
continuous-time Markov chain, microsatellite evolution 



1 Introduction 



A birth-death process (BDP) is a continuous-time Markov chain that models a non-negative integer 



number of particles in a system (Feller , 1971 ). The state of the system at a given time is the number 
of particles in existence. At any moment in time, one of the particles may "give birth" to a new 
particle, increasing the count by one, or one particle may "die", decreasing the count by one. BDPs 
are popular modeling tools in a wide variety of quantitative disciplines, such as population biology, 



genetics, and ecology (Thorne et al, 1991 Krone and Neuhauser, 1997 Novozhilov et al 2006). For 



example, BDPs can characterize epidemic dynamics, (Bailey, 1964 Andersson and Britton 2000), 



speciation and extinction (Nee et al, 1994 Nee, 2006), evolution of gene families (Cotton and Page 



2005 Demuth et al , 2006 ) , and the insertion and deletion events for probabilistic alignment of DNA 



sequences (Thorne et al, 1991 Holmes and Bruno 2001). 

Traditionally, most modeling applications have used the "simple linear" BDP with constant per- 
particle birth and death rates, which arises from an assumption of independence among particles 
and no background birth and death rates. When individual birth and death rates instead depend 
on the size of the population as a whole, the model is called a "general" BDP. Previous statistical 
estimation in BDPs has focused mainly on estimating the constant per-particle birth and death rates 
of the simple linear BDP based on observations of the number of particles over time. However, the 
simple linear BDP is often unrealistic, and nonlinear dependence of the birth and death rates on the 
current number of particles provides the means to model more sophisticated and realistic patterns of 
stochastic population dynamics in a wide variety of biological disciplines. For example, populations 
sometimes exhibit logistic-like growth as their number approaches the carrying capacity of their 



environment (Tan and Piantadosi, 1991). In genetic models, the rate of new offspring carrying an 
allele often depends on the proportions of both individuals already carrying the allele and those 



who do not ( Moran , 1958 ) . In coalescent theory, the rate of coalescence changes with the square of 



the number of lineages (Kingman, 1982). In addition, researchers may wish to assess the influence 



of covariates on birth and death rates by fitting a regression model (Kalbfleisch and Lawless, 1985 



Liu et al, 2007). 



Progress in estimating birth and death rates in BDPs has also typically been limited to contin- 



uous observation of the process (Moran 1951, 1953 Anscombe 1953 Darwin 1956 Wolff, 1965 



Reynolds 


1973 


Keiding 


1975 



at discrete times through longitudinal observations. Estimating transition rates in continuous-time 
Markov processes using discrete observations is difficult since the state path between observations 
is not observed. Furthermore, direct analytic maximization of the likelihood for general BDPs 
remains infeasible for partially observed samples since the likelihood usually cannot be written in 
closed- form. Despite these challenges, several researchers have made progress in estimating pa- 



rameters of the simple linear BDP under discrete observation (Keiding, 1974 Thorne et al 1991 



Holmes and Bruno 2001 Rosenberg et al 2003 Dauxois, 2004). However, none of these develop- 



ments provides a robust method to find exact maximum likelihood estimates (MLEs) of parameters 
in discretely observed general BDPs with arbitrary birth and death rates. 

A major insight comes from the fact that the likelihood of the continuously observed process has 
a simple form which easily yields expressions for estimation of rate parameters. This fact is the basis 
for expectation-maximization (EM) algorithms for maximum likelihood estimation in missing data 



problems (Dempster et al, 1977). In finite state-space Markov chains, the relevant conditional 



expectations (the E-step of the EM algorithm) can often be computed efficiently, and several 



researchers have derived EM algorithms for estimating transition rates in this context (Lange 



1995a 


Holmes and Rubin, 2002 


Hobolth and Jensen 


2005 


Bladt and Sorensen 


2005 


Metzner 



et al , 2007 ) . Unfortunately, finding these conditional expectations for general BDPs poses challenges 



since the joint distribution of the states and waiting times (or its generating function) is usually not 



available in closed- form. Notably, Holmes and Bruno (2001); Holmes and Rubin (2002) and Doss 



et al (2010) are able to find analytic expressions or numerical approximations for these expectations 
in EM algorithms for certain BDPs whose rates depend linearly on the current number of particles. 
While these developments are promising, there remains a great need for estimation techniques that 
can be applied to more sophisticated BDPs under a variety of sampling scenarios. Indeed, more 



complex and realistic models like those reviewed by Novozhilov et al ( 2006 ) may be of little use to 
applied researchers if no practical method exists to estimate their parameters. 

Here we seek to fill this apparent void by providing a framework for deriving EM algorithms 
for estimating rate parameters of a general BDP. We first formally define the general BDP and 
give an exact expression for the Laplace transform of the transition probabilities in the form of 
a continued fraction. We then give the likelihood for continuously-observed BDPs and outline 



the EM algorithm. Next, we describe a novel method to efficiently compute the expectations of 
the E-step for BDPs with arbitrary rates. Since these expectations are convolutions of transition 
probabilities, we perform the convolution in the Laplace domain, and then invert the Laplace 
transformed expressions to obtain the desired conditional expectation. This technique obviates 
the costly numerical integration or repeated simulation that has plagued previous approaches. We 
provide examples of the maximization step for several different classes of BDPs and demonstrate 
a technique for accelerating convergence of the EM algorithm. We show that our method is faster 
than competing techniques and validate it using simulated data. Finally, we conclude with an 
application that analyzes microsatellite evolution and answers an open question in evolutionary 
genomics. 



2 General BDPs and their EM algorithms 
2.1 Formal description and transition probabilities 

Consider a general BDP X{t) counting the number of particles k in existence at times r > 0. 
From state X{t) = k, transitions to state k + 1 happen with instantaneous rate A^, and transitions 
to state k — 1 happen with instantaneous rate /x^. The transition rates Afc and /ifc may depend 
on k but are time- homogeneous. As we show below, it is often necessary to evaluate finite-time 
transition probabilities to derive efficient EM algorithms for estimation of arbitrary birth and death 
rates in general BDPs. This proves useful both in completing the E-step of the EM algorithm and 
in computing incomplete data likelihoods for validation of our EM estimates. For a starting state 
i >0, the finite-time transition probabilities Pij^r) = Pr(X(T) = j \ X(0) = i) obey the system of 
ordinary differential equations 

^^'f^^^ = /iiPj i(r) - XoPi oir), and 

(1) 



dP.,,(r) 
dr 



Aj-iPi,j-i(r) + //j+iPij+i(r) - (Aj + fij)Pij{T), 



for j > 1 with Pj,i(0) = 1 and Pij(O) = for f / j (Feller, 1971). 



For some simple parameterizations of Xk and /Xfc, closed- form solutions exist for the transition 



probabilities Pij{T), but this is not possible for most models. Karlin and McGregor (1957) show 



4 



that for any parameterization of A^. and //fc, it is possible to express the transition probabilities in 
terms of orthogonal polynomials. However, in practice these special polynomials are difficult to 
find, and even when they are available, they rarely yield solutions in closed-form or expressions 



that are amenable to computation (Novozhilov et al, 2006, Renshaw, 2011). In contrast, the con- 



tinued fraction method we outline below does not require additional model-specific insight beyond 
specification of and /ifc. 

To solve for the transition probabilities, it is advantageous to work in the Laplace domain 



(Karlin and McGregor, 1957). This transformation also proves essential in maintaining numerical 



stability of transition probabilities in general BDPs and in computing the conditional expectations 
necessary for the EM algorithm derived in a subsequent section. Laplace transforming equation ([T]) 
yields 



•s/j,o(s) - (5jo = - Ao/j,o(s), 



(2) 



where fi,j{s) is the Laplace transform of Pij{T) and 5ij = 1 if i = j and zero otherwise. Letting 
i = and rearranging Q, we obtain the recurrence relations 



/o,o(s) 
/o,j-i(s) 



S + Xq- fli 



/o.i(^) 
fo,o{s) 



, and 



A 



(3) 



s + /x, + A,-^,+i(^|g^ 



We can inductively combine these expressions for j = 1,2,3,... to arrive at the well-known gener- 
alized continued fraction 



/o,o(s) 



1 



s + Ao 



Ao/^i 



(4) 



s + Al + /ii 



A1/U2 



S + X2+ H2 



This is an exact expression for the Laplace transform of the transition probability Po,o (''")• In @, 
let ai = 1 and aj = — Aj_2/i_,-i, and let 61 = s + Aq and bj = s + Aj_i + fJ-j-i for j > 2. Then Q 
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becomes 



/o,o(s) 



ai 



We can write this more compactly as 



a2 



62 + 



as 
bs + - 



(5) 



/o,o(s) 



Qi 02 as 

&1+ ^2+ ^S + 



(6) 



The feth convergent of fofi{s) is 



t{k),^ _ ai 02 a/c _ ^fc(s) 



61+62+ 6fc Bk{s) 



(7) 



where ^fc(s) and Bk{s) are the numerator and denominator of the rational function /q'q- The 



transition probabilities Pij{T) for i,j > can be derived in continued fraction form by combining 
((2j) and ^ to obtain 



n 



\k=i / 



Bj{s) Bi{s)ai+2 ai+3 



Bjjs) Bj{s)aj+2 aj+3 
Bj+i{s)+ bj+2+ bj+3+ 



for j < i, 



(8) 



for i < j, 



(Murphy and O'Donohoe, 1975 Crawford and Suchard, 2011) 



Although the Laplace transforms of the transition probabilities are generally still not available 
in closed-form, a continued fraction representation is desirable for several reasons: 1) continued 
fraction representations of functions often converge much faster than equivalent power series; 2) 
there are efficient algorithms for evaluating them to a finite depth; and 3) there exist methods 



for bounding the error of truncated continued fractions (Bankier and Leighton, 1942 Wall, 1948 



Blanchl fT964l ILorentzen and Waadelandl fT992l ICraviotto et all fT993l lAbate and Whittl fT999 



Cuyt et al , 2008 1 . For an arbitrary BDP, we recover the transition probabilities through numerical 



inversion of the Laplace-transformed expressions. We evaluate the continued fraction to a moni- 



6 



tored depth that controls the overall error and generates stable approximations to the transition 



probabilities unattainable by previous methods (Murphy and O'Donohoe, 1975 Parthasarathy and 



SudheshI |2006t [Crawford and Suchardj |201lD . 

The ability to compute transition probabilities for general BDPs with arbitrary rate parame- 
terizations proves useful in two ways. First, if we interpret finite-time transition probabilities as 
functions of an unknown parameter vector 0, then Pa,b{t) given 6 returns the likelihood of a dis- 
crete observation from a BDP such that X(0) = a and X{t) = b, where the trajectory in time t 
between a and b is unobserved. Second, transition probabilities play an important role in computing 
conditional expectations of sufficient statistics, as we shall see below. 

2.2 Likelihood expressions and surrogate functions 

[Figure 1 about here.] 

With a formal description of a general BDP and the finite-time transition probabilities in 
hand, we now proceed with our task of estimating the parameters of a general BDP using discrete 
observations. Given one or more independent observations of the form Y = (X(0) = a,X{t) = b) 
from a general BDP, we wish to find maximum likelihood estimates of the rate parameters and 
/ifc for A; = 0, 1, 2, . . .. We will assume that the birth and death rates at state k depend on both k 
and a finite-dimensional parameter vector 0, so that the form of Xk{d) and fJ-ki^) is known for all 
k. 

For a single realization of the process starting at X(0) = a and ending at X{t) = 6, let be 
the total time spent in state k. Let be the number of "up" steps (births) from state k, and let 
Dk be the number of "down" steps (deaths) from state k. Let the total number of up and down 
steps in a realization of the process be denoted hy U = Ylk^o Uk and D = YlT=o respectively. 
We also define the total particle time, 

T'particio = / X{t) dr = kTk, (9) 

that counts the amount of time lived by each particle since time r = 0. Of course, the total elapsed 
time is t = Yl'^Lo'^k- We demonstrate these concepts schematically in Figure [Tj 
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The log-likelihood for a continuously observed process takes a simple form when we sum over 



all possible states k (Wolff, 1965): 



(10) 



k=0 



However, when a BDP is sampled discretely such that only X{0) = a and X{t) = b are observed, the 
quantities U^, Dj., and are unknown for every state k, and we cannot maximize the log-likelihood 



(10) without them. 

We therefore appeal to the EM algorithm for iterative maximum likelihood estimation with 



missing data (Dempster et al, 1977). In the EM algorithm, we define a surrogate objective function 



Q by taking the expectation of the complete data log-likelihood (10), conditional on the observed 



data Y and the parameter values 0^"*) from the previous iteration of the EM algorithm (the E-step). 
Then we find the parameter values that maximize this surrogate function (the M-step). This 

two-step process is repeated until convergence to the maximum likelihood estimate of 6. Taking 
the expectation of (10) conditional on Y and 6^'^\ we form the surrogate function Q: 



Q{e I e^™)) =E[£(6>) I Y,6'(™)] 



^E([/fc|Y)log[Afc(0)] +E(I)fc|Y)log [fiu{e)] -E(rfc|Y)[Afc(0) + /ifc(0)], 



(11) 



fc=0 



where for clarity we have omitted the dependence of the expectations on the parameter value 0^™) 
from the mth iterate. In general, we assume that the maximum likelihood estimator exists; see 



Bladt and Sorensen (2005) for a discussion of the issues of identifiability, existence, and uniqueness. 



2.3 Computing the expectations of the E-step 

Computing the expectations of Uk, D^, and Tk in the E-step is difficult in birth-death estimation 
since the unobserved state path and waiting times are not independent conditional on the observed 



data Y. Doss et al (2010) adopt an approach for linear BDPs that combines analytic results with 



simulations. For some models, these authors are able to derive the generating function for the joint 
distribution of U, D, Tparticiei and the state path conditional on X{0) = a and can manipulate this 
generating function to complete the E-step. For a more complicated linear model. Doss et al] resort 
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to approximating the relevant conditional expectations by simulating sample paths, conditional on 



Y (Hobolth 2008). 



Our solution is to recognize that we do not need to know very much about the missing data 
to find the conditional expectations used in the sufficient statistics above. In fact, the transition 
probabilities are all that we require. The following integral representations of the conditional 
expectations in the EM algorithm will prove useful: 



E{Uk\Y) 
E{Dk\Y) 
E{Tk\Y) 



/ Pa,k{'^)XkPk+i,b{t - t) dr 
Jo 

Padt) 

/ Pa,k{T)lJ.kPk-l,h{t - t) dr 

Jo 

Pa,b{t) 

[ Pa,k{^)Pk,b{t - ^) dr 
Jo 

Pa,b{t) 



and 



(12a) 
(12b) 
(12c) 



These formulas have appeared in many types of studies related to EM estimation for continuous- 



time Markov chains (Lange, 1995a Holmes and Rubin 2002 Bladt and Sorensen 2005 Hobolth 



and Jensen, 2005 Metzner et al, 2007). For general BDPs whose transition probabilities must be 



computed numerically, numerical integration over the product of the densities can be computation- 
ally prohibitive. 



However, the numerators in ( 12 ) a-c are convolutions of integrable time-domain functions. Since 



the Laplace transforms fa,b{s) of these transition probabilities are available and easy to compute, 
we take advantage of the Laplace convolution property, arriving at the representations 



E([/fc|Y 
E(L»fc|Y 

IE(Tfc|Y 



) = Aa 



c- 



fa,k{s) fk+l,b{s) (t) 



fj-k — 
£-1 



Pa,h{i) 
/a,fc(s) fk-l,b{s) (t) 



Pa,bii) 
fa,kis) /fc,b(s) (t) 



-, and 



Pa,b{t) 



(13a) 
(13b) 
(13c) 



where £ ^ denotes inverse Laplace transformation. Although these formulas are equivalent to (12) 



they offer substantial time savings over computing the integral directly, and render tractable the 
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computation of expectations in the EM algorithm for arbitrary general BDPs. 



To calculate the numerators of (13), we use the Laplace inversion method popularized by Abate 



and Whitt (1992 1995). This involves a Riemann sum approximation of the inverse transform 



that stabilizes the discretization error and is amenable to series acceleration methods ( [Abate and 



Whitt 


1999 


Press 


2007 



Press, 2007). To evaluate the continued fraction Laplace transforms fa,b{s), we use 



the modified Lentz method (Lentz, 1976, Thompson and Barnett 1986; Press, 2007). 



2.4 Maximization techniques for various BDPs 

In contrast to the generic technique outlined above for computing the expectations of the E-step, 
the M-step depends explicitly on the functional form of the birth and death rates Afc(0) and /ifc(0). 
Here we give several representative examples of BDPs and techniques for completing the M-step of 
the EM algorithm, such as analytic maximization, minorize-maximize (MM), and Newton's method. 

2.4.1 Simple linear BDP 

In the simple linear BDP, births and deaths happen at constant per-capita rates, so Afc = kX and 
fik = kfi. The unknown parameter vector is = (A,/i), and the surrogate function becomes 



Q{e) = J2^iUk\Y) log[kX] + E{Dk\Y) log[M - E{Tk\Y)kiX + /i). 



(14) 



fe=0 



Taking the derivative of ( 14 ) with respect to the unknown parameters, setting the result to zero. 



and solving for A and fi gives the M-step updates 

_ E{U\Y 



lE(rparticlc| Y 

_ E{D\Y) 



-, and 



(15a) 
(15b) 



IE(Tparticlc|Y) 

These updates correspond to the usual maximum likelihood estimators in the continuously observed 



process (Reynolds, 1973). Note that the transition probabilities Pa,b{t) in the denominators of the 



expectations in (12) cancel out in (15a) and (15b). When this is the case, transition probabilities 



are not necessary to derive an EM algorithm. 
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2.4.2 Linear BDP with immigration 

Sometimes populations are not closed, and new individuals can enter; we call this action "immigra- 
tion." Another interpretation arises in models of point mutations in DNA sequences. Suppose new 
mutations arise in a DNA sequence via two distinct processes: one inserts new mutants at a rate 
proportional to the number already present, and the other creates new mutations at a constant 
rate, regardless of how many already exist. To model this behavior, we augment the simple linear 
BDP above with a constant term v representing immigration, so that Afc = k\ + u and /i^ = kfi. 
The log-likelihood becomes 

oo 

^{e) = Y,Uk^og{kx + u) + Dk\og{^J)-Tk[k{x + ^J) + ly]. (le) 

fe=0 

Unfortunately, if we take the derivative of the log-likelihood with respect to A or zv, the unknown 
appears in the denominator of the terms of the infinite sum. However, since each summand is a 
concave function of the unknown parameters, we can separate them in a minorizing function H 
such that for ah 6, H{e\e^"^^) < £{6) and H{e^"''>\e^"'^) = £{0^"^^) as follows: 

(17) 

= bfc log (pfeA) + (1 - Pk) log ((1 - Pk»] + Dk log(^) - [k{\ + + v]Tk, 

k=o 

where 

- A;A(-) + i/M ■ ^ ^ 

Then letting Q{e \ 6^""^) = E (^H{e) \ Y,^^™)) be the surrogate function, this minorization forms 
the basis for an EM algorithm in which a step of the minorize-maximize (MM) algorithm takes 



the place of the M-step, and the ascent property of the EM algorithm is preserved (Lange, 2010). 
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Maximizing Q with respect to A and v yields the updates 



^PfcE(C/fc|Y) 



^(m+l) _ fc=0 



E(TparticlelY) 



, and 



^(l-pfc)E(C/fc|Y) 

(m+l) _ k=0 



(19a) 



(19b) 



Expression (19a) is similar to (15a), the update for A in the simple BDP. The difference lies in that 



each E(f/fc|Y) in this case is weighted by the proportion of additions at state k due to births, not 



immigrations. The update for /x is the same as (15b). 



2.4.3 Logistic/restricted growth 

To illustrate an EM algorithm for more complicated rate specifications in which no MM update 
is evident and the rates no longer depend on the current state A: in a linear way, we examine a 
model for restricted population growth. Typical deterministic population models often incorporate 
limitations on population size due to the carrying capacity K of the environment. One famous 



example is the logistic model of population growth (Murray, 2002). Continuous-time stochastic 



analogs have previously required a finite cap on population size (Tan and Piantadosi 1991). These 



stochastic models roughly mimic the behavior of the deterministic model for population sizes below 
K, but are limited because they do not allow growth beyond K. Here we present a model which 
supports transient growth beyond the carrying capacity, but where the population size tends to a 
balance between restricted growth and death. 

Suppose births are cooperative, requiring two parents, but fecundity decays as the number of 
extant particles increases, and death remains an independent process such that A^ = Xk'^e~^^ and 
Hk = kfi. Here, we can interpret the carrying capacity roughly as the population size k > at 
which Afc w /Life. Ignoring irrelevant terms, the surrogate function becomes 

oo 

Q{9 I = ^E([/fc|Y)[log(A) - m + E{Dk\Y) log(/i) - E{Tk\Y)[Xk^e-^'' + fe/i]. (20) 

k=0 



Since A and (3 appear together, we opt for a numerical Newton step. The gradient of Q with respect 
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to these parameters is 



k=0 



f2 [kHUk\Y) + Xk^e-f^^'EiTklY) 



k=0 



(21) 



and the Hessian is 



H 



E(;7|Y) 



k=0 

oo 



k=0 



k=0 



J 



(22) 



Then we update these parameters by 





(23) 



The ascent property is preserved when a Newton step is used in place of an exact M-step (Lange 



1995a). The update for fi is the same as (15b). 



2.4.4 SIS epidemic models 

Under a very common epidemic model, members of a finite population of size are classified as 



either "susceptible" to a given disease or "infected" (Bailey 1964 Andersson and Britton, 2000). 



Susceptibles become infected in proportion to the number of currently infected in the population, 
and infecteds revert to susceptible status with a certain rate independent of how many infecteds 
there are. This idealized susceptible-infectious-susceptible (SIS) infectious disease model specifies 
a general birth-death process in which we track the number of infecteds. Let = /3k{N — k)/N 
be the rate of new infections when there are already k infected in the population. Let /Xfc = ^k/N 
be the rate of recovery of infecteds to susceptibles. Then if = {(3,^), we have 



TV 



= J2 HUklY) log(/3) + E{Dk\Y) log(7) - E{n\Y){k{N - k)P + kj)/N, (24) 



k=0 
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and the update for /3 is 



= . (25) 



- k)kE{Tk\Y) 



k=0 



The update for 7 is 



]E(rparticle|Y) 

2.4.5 Generalized linear models 

Our general framework allows assessment of the influence of covariates on the rates of a general BDP 
in a novel way. Suppose we sample observations from independent processes Xj(r), i = 1,. . . ,N 
and observe Yj = (Xj(0), Xi{ti)) associated with d covariates Zj = (zn, . . . , ZidY. These processes 
may represent different subjects in a study. We model the birth and death rates Ajjt and f^ik for 
each process/subject Xi as functions of Zj and unknown d-dimensional regression coefficients Ox 
and 0^ in a generalized linear model (GLM) framework. We link 

log{Xik) = g{k,zl9x) and log(Mifc) = z*6>^), (27) 

where g{-) and h{-) are scalar-valued functions. We note the possibility that covariates may differ 
between Ox and 0^ through trivial modification; to ease notation, we do not explore this direction. 
Given N independent processes, we sum log-likelihoods to arrive at the multiple-subject surrogate 
function: 

N 00 

Q(0|0M) =J2Y1 \mk\Y^)9(,k,zl0x) + EiD,\Y,)h{k,zle^) 

i=i k=o (28) 



K{Tk\Yi) (^e^(*^'^i^A) + ^h{k,zl9^)^ 



Although we cannot usually maximize this surrogate function for all elements of {9x^9 n) simulta- 
neously, a Newton step is often straightforward to derive. 

As an example, consider generalized linear model extension of the simple linear BDP in which 

log(Aifc) = log(fe) + z*6>A, and log(/Xife) = log(A:) + z*6>^. (29) 
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Taking the gradient of the corresponding surrogate function Q with respect to the parameters Ox 
yields 

TV 

Ve.Q = ^iU\Yi)zi - e^'^^E(Tparticic| Y,)z, (30) 

i=l 

and the second differential (Hessian) of Q is 

N 

di.Q = -Y. e'^^'IE(rparticle| Yi)ziZ*. (31) 

j=l 

Combining these, we arrive at the Newton step for the parameter vector Ox: 

0("^+i)^0M_(d2^g)-iv,^g. (32) 

A similar update can be found for 0^. These updates are examples of the gradient EM algorithm 



for regression in Markov processes described by Wanek et al (1993) and Lange (1995a I . It is worth 



noting that the Hessian matrix d^^Q can become ill-conditioned, making it difficult to invert for 



the Newton step in (32) for some problems. Unfortunately there is no quasi-Newton option since 
in general E(Tparticie| Y)e^^^^ is unbounded. An alternative to inversion of the Hessian matrix is 
cyclic coordinate descent in which a Newton step is performed for each coordinate Oj individually. 
This carries the advantage of avoiding matrix inversion, but convergence is slower and the ascent 
property must be checked at each Newton step. 

2.5 Implementation 

Before presenting simulation results and our application to microsatellite evolution, we briefly 
outline some implementation details that ease our subsequent analyses. 

2.5.1 E-step acceleration 

The E-step in these EM algorithms for BDP estimation usually involves infinite weighted sums of 
the conditional expectations E({7;s|Y), E(L'fc|Y), and E(T,fc|Y). For example, when estimating A in 
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the simple linear BDP, we must evaluate 



E(C/|Y) = ^E(t/fc|Y) 



^Afc£ ^ /a,fc(s) fk+i,b{s) (t) 



k=0 



(33) 



k=0 



Fortunately, the conditional expectations of Uk, Dk, and are usually small for k <^ min(a, 6) 



and k ^ max(a, b), so it is possible to replace the infinite sum in (33) by a finite one. We find an 



additional increase in computational efficiency by exchanging the order of Laplace inversion and 



summation. Then (33) becomes 



E([/|Y) 





^max 






>^kfa,k{s)fk+l,b{s) 






k—k ■ 







(34) 



where we choose k^i^ to be the largest k < min(a, b) such that Xk\fa,k{s) — fk+i,a\ < and fcmax 
to be the first k > max(o, 6) such that Xk\fa,k{s)fk+i,b{s)\ < 10^^. In practice, we rarely need to 
compute expectations for k less than min(a, b) — 10 or greater than max(a, b) + 10. 

2.5.2 Quasi-Newfton acceleration of EM iterates 

EM algorithms are notorious for slow convergence, especially near optima. When appropriate, we 



exploit the quasi- Newton acceleration method introduced by Lange ( 1995b ) in our implementations 
Other acceleration methods exist, and may give better results, depending on the problem (Lange 
1995a Louis, 1982; Meilijson, 1989 Jamshidian and Jennrich, 1993). Figure [2] shows the log- 
likelihood function and iterates for the basic EM and accelerated EM methods in the simple linear 
model. Since the quasi-Newton acceleration method does not guarantee that the likelihood increases 
at each step, "step-halving" is occasionally necessary to achieve ascent. Note that this requires 
likelihood evaluation at least once per iteration. Our approach is advantageous in that we can 



efficiently calculate this likelihood (transition probability) for any general BDP (Crawford and 



Suchard 2011) 



[Figure 2 about here.] 
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2.5.3 Asymptotic variance of EM estimates 



Finding the observed information matrix for an EM estimate can be challenging. Louis (1982) 



gives formulae for the observed information, which Doss et al (2010) use to derive analytic expres- 



sions for the observed information for very simple BDPs. However, analytic expressions for the 
asymptotic variance are generally hard to find for more complicated models. We instead turn to 



the supplemented EM (SEM) algorithm of Meng and Rubin (1991), which computes the informa- 



tion matrix of the EM estimate of 6 after the MLE 6 has been found. The observed information is 
1(6) = -(fQ{e\e)(I-dM{e)), where M(6>) is the EM algorithm map such that = M(0('^)). 

We numerically approximate the diff'erential dM at the termination of the EM algorithm. 

We note also that since we are able to calculate transition probabilities directly, the observed 
data log-likelihood is easily computed as 



N 



(0) = ^logP„„„,(t, 



(35) 



i=l 



where = Xi{0) and bi = Xi{ti). As an alternative to the approaches outlined above, we can 
calculate the Hessian using purely numerical techniques. If H(0) = d?i(6) is the numerical Hessian 
evaluated at the estimated value 6, then I ~ — H(0). 



3 Results 



3.1 Laplace convolution E-step comparison 



To illustrate the computational speedup that the Laplace convolution formulae (13) and their 



acceleration in section 2.5.1 achieve over existing methods, we calculate conditional expectations 



for various BDP models for performing the E-step and report computing times in Table [2j The first 
method in the table employs rejection sampling of trajectories where we condition on the starting 



state, and reject based on the ending state (Bladt and Sorensen, 2005). The second method adapts 



an endpoint-conditioned simulation algorithm ( Hobolth , 2008 Hobolth and Stone , 2009 ) . The third 



considers naive time-domain convolution (Equation (|12|)) using the integrate function in R. Finally, 
we compute the same quantities via the Laplace-domain convolution method outlined in section 



2.3 In our implementations, we have made every efi^ort to reuse as much shared R code as possible. 
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with the aim of making the routines comparable. We consider four different BDPs. For a simple 
linear BDP and a linear BDP with immigration, we use the data Y = (^(0) = 19,X(2) = 27). 
Under a logistic model, the data are Y = {X{0) = 10,X{2) = 16), and for the SIS model the data 
are Y = {X{0) = W,X{2) = 31). We list all model parameter values in Table [2] 

As seen in Table H the Lapl ace convolution method is often more than 10 times faster than 
the other methods. In terms of time-performance, the endpoint-conditioned simulation stands as 
second best, achieving almost comparable speed in the logistic BDP. To interpret this finding, we 



recall that Hobolth ( 2008 ) constructs an endpoint-conditioned simulation for performing the E-step 
in finite state-space Markov chains. Therefore, to adapt this method we approximate each BDP 
by a Markov chain with a finite transition rate matrix. To choose the arbitrary dimension of this 
matrix we truncate the process at the first state k > max(a, b) such that Pa,k{t) < 10~^, and the 
resulting estimates agree substantially with the other methods. We are aware that the size of the 
rate matrix affects the speed of the simulation routine, so we wish to keep the matrix as small 
as possible. On the other hand, the matrix must remain large enough to include states that may 
be visited with high probability in a path from a to 6 over time t. For the logistic model, such a 
stringent upper bound lies just above the relatively small carrying capacity. However, endpoint- 
conditioned simulation completely fails for the SIS model, an issue we discuss later. Finally, and 
quite naturally, the two convolution methods arrived at nearly the same answer for each model; the 
difference is largely due to very different sources of numerical error, but at disparate computational 
costs. 



[Table 1 about here. 



3.2 Synthetic examples 



To evaluate the performance of our EM algorithms, we simulate discrete observations from several 
of the BDPs outlined above. For each sample, we draw starting points Xi(0) uniformly from the 
integers to 20, and times ti uniformly from 0.1 to 3. We then simulate a trajectory of the BDP 
and record the state Xj(tj). For the generalized linear model (GLM), we employ the simple linear 
parameterization with a log link with d = 2 covariates. We specify the covariates Zj = -21,2) as 
follows: ~ iV(l, 0-2), Zi^2 ~ N{2, a'^) for i = 1, . . . , N/2, Zi^i ~ iV(2, a"^) and Zi,2 ~ N{1, a^) for 
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i = N/2 + 1,...,N, where = 0.1. 

Table [3] reports the number of simulated observations, true parameter values, point-estimates, 
asymptotic standard error estimates for all model parameters. It is important to note that the 
MLEs can differ substantially from the parameter values used to perform the simulation, regardless 
of the algorithm used to find the MLEs. This is due to several factors, including: 1) missing state 
paths; 2) stochasticity of the BDP generating the state paths; 3) arbitrary choice of starting states 
Xi(0); and 4) finite sample sizes. Despite these limitations inherent in learning from partially 
observed stochastic processes, the point-estimates match the true parameter values rather well. 

[Table 2 about here.] 



3.3 Application to microsatellite evolution 



Microsatellites are short tandem repeats of characters in a DNA sequence ( Schlotterer , 2000 EL 



legren, 2004; Richard et al, 2008). The number of repeated "motifs" in a microsatellite often 



changes over evolutionary timescales. The molecular mechanism responsible for changes in repeat 



numbers is known as "polymerase slippage" (Schlotterer, 2000). Several researchers have proposed 



linear BDPs for use in analyzing evolution of microsatellite repeat numbers (Whittaker et al, 2003 



Calabrese and Durrett 2003 Sainudiin et al, 2004). However, many investigations demonstrate 



that microsatellite mutability depends on the number of repeats already present, motif size, and 



motif nucleotide composition (Chakraborty et al, 1997: Eckert and Hile, 2009 Kelkar et al, 2008 



Amos, 2010). Exactly how these factors affect addition and deletion rates remains an open question 



(Bhargava and Fuentes, 2010) 



To our knowledge, no previous study formulates or fits a general BDP in which motif size 
and composition are treated as a covariates in a generalized regression framework, despite the 



scientific interest in examining such effects on microsatellite evolution. Webster et al (2002) study 



the evolution of 2467 microsatellites common (orthologous) to both humans and chimpanzees, 
providing an ideal dataset for studying the influence of repeat number and motif size on addition 



and deletion rates. For each of these observed microsatellites, Webster et al (2002) record the motif 



nucleotide pattern and the number of repeats of this motif found in chimpanzees and humans, and 
estimate a mutability parameter that controls the rate of addition and deletion. 
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We now present an extended application of our BDP inference technique to chimpanzee-human 



microsatellite evolution, drawing on the data in Table 6 of the supplementary information in Web- 



ster et al (2002). We introduce several novel modeling and inferential techniques relevant to the 
study of microsatellites, and deduce the effect of motif size and composition on microsatellite ad- 
dition and deletion rates. While the likelihood takes a slightly more complicated form, our BDP 
regression technique is straightforward to implement and yields insight into the complicated process 
of microsatellite evolution. 

3.3.1 Evolutionary model 

To analyze the data as realizations from a BDP, we must acknowledge the evolutionary relationship 
between chimpanzees and humans. Suppose the most recent common ancestor of chimpanzees and 
humans lived at time t in the past, so that an evolutionary time of 2t separates contemporary hu- 
mans and chimpanzees. We note that under mild conditions, general BDPs are reversible Markov 



chains (Renshaw, 2011). Therefore, assuming stationarity of the chimpanzee microsatellite length 
distributions, we stand justified in reversing the evolutionary process from the ancestor to chim- 
panzee, so that for estimation purposes we may regard humans as direct descendants of modern 
chimpanzees (or vice- versa) over an evolutionary time of 2t. If C is the number of repeats in a chim- 
panzee microsatellite and H is the number of repeats in the corresponding human microsatellite, 
then the likelihood of the observation Y = (C, H) is 

oo 

Pr{Y) =^7TkPk,c{t) PkMt) 

k=0 

oo 

= 7rc^PcMt)Pk,H{t) (^6) 
fe=0 

= TTcPc,H{2t), 

where vr^ is the equilibrium probability of the microsatellite having k repeats. The second line 
follows by reversibility and the third by the Chapman-Kolmogorov equality. Therefore, the log- 
likelihood of the observation Y is now logvrc + £{6;Y). Figure [s] shows a schematic representation 
of this reversibility argument. 

[Figure 3 about here.] 
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3.3.2 BDP rates and equilibrium distribution 



The observed data for microsatellite i are Yj = (Xj(0), Xj(l)), where Xi(0) is the number of repeats 
observed in chimpanzees, is the number of repeats observed in humans, and the evolution- 

ary time separating humans and chimpanzees is scaled to unity. In addition to the evolutionary 



relationship explained above, there are other complications: in the Webster et al (2002) dataset 



it is evident that microsatellites with small numbers of repeats are not detected. Rose and Falush 



( 1998 ) argue that there is a minimum number of repeats necessary for microsatellite mutation via 



polymerase slippage. Sainudiin et al (2004) interpret this finding as justification for truncating the 



state-space of BDP at Xminj so that X{t) > 

Xmin- To avoid questions of ascertainment bias (see 



e.g. Vowles and Amos (2006)), and to make our results comparable to those of past researchers. 



we define a microsatellite to be a collection of more than Xmm repeated motifs, where Xmin is 9 for 
repeats of size 1, 5 for repeats of size 3 and 4, and 2 for repeats of size 5. 



Researchers have also observed that microsatellites do not tend to grow indefinitely (Kruglyak 



et al, 1998). The maximum number of repeats in the Webster et al dataset is 47. This suggests 



a finite nonzero equilibrium distribution of microsatellite lengths. To achieve such an equilibrium 
distribution, we preliminarily view the evolution as a linear BDP with immigration on a state-space 
that is truncated below Xmm- R is reasonable to assume that rates of addition and deletion depend 
linearly on how many repeats are already present. Then for a microsatellite that currently has k 
repeats, the birth and death rates are 



k\ -\- X k ^ ■^min 
/c < 3^min 



and fik = < 



kfJ, k > Xmin 
k 3^min- 



(37) 



This gives a geometric equilibrium distribution for the number of repeats: 



A\ /A 



k < Xmin; 



(38) 



when X < fi (Renshaw, 2011). We choose this simple model so that the BDP has a simple closed- 



form nonzero equilibrium solution that is easy to incorporate into the log-likelihood. Note that the 
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constraint A < /i does not mean that the rate of microsatelHte repeat addition is always less than 
the rate of deletion, since it is possible that \k > fJ-k for small k. Additionally, X < /j, does not 
mean that the number of repeats in a microsatellite tends to zero over long evolutionary times — 



the equilibrium distribution (38) assigns positive probability to all repeat numbers greater than or 

equal to Xmin- 

3.3.3 Likelihood and surrogate function 

Now we augment the log- likelihood with the log-equilibrium probability of observing Xi{0) chim- 
panzee repeats 

N 

F{e)=^log7Tx,^0)+m^^), (39) 



i=l 



where i{0; Yj) is equivalent to ( 10 ) . Including the influence of the equilibrium distribution is similar 



to imposing a prior distribution on A and /x. To ensure the existence of the equilibrium distribution 



(38), we must also incorporate the constraint X < fi. To achieve maximization of the augmented 



log-likelihood (39) under this constraint, we impose a barrier term of the form jlog{fi — A). By 
iteratively maximizing and sending the barrier penalty 7 — )• 0, we can achieve maximization under 
the inequality constraint. More formally, if we let 



N 



H{e) = Y,HTrx,^o) + mYi)] +7log(/i-A), (40) 



i=l 

then 

argmax H{0) — )• argmax F{6) (41) 


under the constraint A < /i as 7 — )• 0. 

To incorporate and evaluate the influence of motif size and composition heterogeneity, we now 
treat A and /i in the ith observation as functions of the covariate vector Zj in a general BDP. 
Suppose microsatellite i has motif size r^. We code the vectors Zj as follows: 



{1,0,0, Pa, PcPtY ri = l 

il,l,0,Pa,Pc,PtY n = 2 (42) 
{1,0, 1, Pa,Pc,Ptf ri > 3 
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where px is the proportion of x nucleotides per repeat. We define a single parameter a that controls 
the difference between A and Then in the ith microsatellite, the complete model becomes 

log(Afc,i) = log(A: + l) + a + z\e and log{fik,i) = log(fc) + zjO. (43) 

Therefore (a, 0)* is the 7x1 vector of unknown parameters. Putting all this together, the surrogate 
function becomes 



N 



Q(6>|6>("^)) oc ^Xi(0)Q + log(l-e") + 



i=l 



J2nUk\Yi){a + zle)+E{Dk\Yi)zle 
^=0 (44) 



+ 7log(-a), 



where a < since A < /i, and we send the penalty 7 — )• as the algorithm converges. We use a 
gradient EM algorithm to find the MLE of (q, 0). 

Table |4] reports the parameter estimates, along with asymptotic standard errors. From these 
results, we infer that motifs of different sizes and composition have different characteristics under 
our evolutionary model. Specifically, A and fj, are greatest for dinucleotide repeats, as compared to 
motifs with one or at least three repeats. Motifs consisting mostly of A and T nucleotides also give 
rise to higher A and /x. Table [T] shows the estimated A and /i for each unique motif pattern in the 



dataset. These conclusions are largely consistent with the descriptive results obtained by Webster 



et al (2002). Our analysis also provides a natural probabilistic justification for the existence of 
a finite nonzero equilibrium distribution of microsatellite repeat numbers and a formal statistical 
framework for deducing the effect of motif size and repeat number on mutation rates. 

[Table 3 about here.] 



4 Discussion 

Application of stochastic models in statistics requires a flexible and general approach to parameter 
estimation, without which even the most realistic model becomes unappealing to researchers who 
wish to learn from the data they have collected. Estimation for continuously observed BDPs is 
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straightforward and well-established. For partially observed BDPs, our approach is unique because 
it requires only two simple ingredients: the functional form of the birth and death rates Afc(0) and 
for all k, and an exact or approximate M-step. A third ingredient is optional: the Hessian 
of the surrogate function is useful when asymptotic standard errors are desired. However, this 
matrix can often be approximated numerically upon convergence of the EM algorithm, since the 



observed-data likelihood is available numerically via (35). With these ingredients in hand, even 
elusive general BDPs become tractable. 

In previous work on estimation for BDPs, completion of the E-step typically relies on time- 
domain numerical integration or simulation of BDP trajectories. As we show in Table [2| both 
rejection sampling and endpoint-conditioned simulation can occasionally perform satisfactorily, es- 
pecially in comparison to time-domain convolution. However, endpoint-conditioning is designed for 
finite state-space Markov chains, and it relies on a matrix eigendecomposition to calculate tran- 
sition probabilities. As we show for the SIS model, this matrix becomes nearly singular, causing 
the simulation algorithm to fail, even when we choose parameter values that are not biologically 
unreasonable. The Laplace convolution in the E-step of our algorithm is more generic with equiv- 
alent or better performance. For this reason, a variation on our Laplace convolution method for 
computing the E-step may offer further use in estimation for non-BDP finite Markov chains as 
well, such as nucleotide or codon substitution models. For some linear BDPs, the availability of 
a generating function furnishes analytic E- and M-steps yielding very fast parameter updates in 



closed-form (Doss et al, 2010). For some models, these tools provide the asymptotic variance of the 
MLE in closed-form. However, for the majority of BDPs, we must return to the Laplace convolution 
method outlined in this paper. 

If one cannot find analytic parameter updates in the M-step, several options remain available. 



With a minorizing function as in section[2.4.2 an EM-MM algorithm is viable. Further, one or more 



numerical Newton steps offers an alternative, as in sections 2.4.3 and 2.4.5 One may employ other 



gradient-based methods as well. Although the MM update derived for the BDP with immigration 



(section 2.4.2) is appealing in its simplicity, multiple minorizations of the likelihood can result in 
very slow convergence, since the surrogate function lies far from the true likelihood for most values 
of 9. In addition, Newton steps that require matrix inversion may suffer since the Hessian of the 
surrogate can become ill-conditioned. 
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Even with the substantial speedup offered by our Laplace convolution method for performing the 
E-step and quasi- Newton acceleration of the EM iterates, our algorithms can move slowly toward 
the MLE. Here, naive numerical optimization of the incomplete data likelihood can sometimes 
run computationally faster. However, such techniques perform very poorly when the number of 
parameters increases and they often require specification of tuning constants in order to reach the 
global optimum. For BDP estimation problems, EM algorithms offer several other advantages over 
naive numerical optimization, and these benefits are especially stark when the M-step is available 
in closed- form. First, when the log- likelihood is locally convex, the EM algorithm is robust with 
respect to the initial parameter values near the maximum, and EM algorithms generally do not need 
tuning parameters. Further, the ascent property ensures the iterates will approach a maximum. 
Perhaps the most important reason to consider EM algorithms is that they can accommodate high- 
dimensional parameter spaces without substantially increasing the computational complexity of the 
algorithm. This is especially useful in models with many unknown parameters when performing 



regression with covariates (section 2.4.5), or our microsatellite example. We also note the potential 
for substantial computational speedup by parallelizing the E-step. When discrete observations 
from a BDP are independent, the E-step may be performed in parallel for every observation. For 
example, E(C/|Yi) can be computed simultaneously for i = 1,...,N. When speed is an issue, 
graphics processing units may prove useful in reducing the computational cost of EM algorithms 



( Zhou et al[|2010D . 

With regard to our example, we present a novel way of studying the evolution of microsatellite 
repeats using a generalized linear model. Previous efforts often ignore the evolutionary relation- 
ship between organisms, use incomplete or equilibrium models of repeat numbers, or fit separate 
models to motifs of different sizes. We treat motif size as a categorical variable and incorporate 
motif nucleotide composition, allowing us to fit a single model to all the microsatellite observations 



simultaneously. Though our rate specification (37) and resulting equilibrium distribution (38) are 



intended to be somewhat simplistic, more sophisticated models that are informed by biological 
considerations may be fruitful. The only requirement in our setup is that the gradient and Hessian 
of Afc, fiky and TTfc be available for any repeat number k. Although our microsatellite example is 
limited in scope, it is easy to imagine a more comprehensive study. For example, incorporating 
more sophisticated motif nucleotide composition covariates and location of the microsatellite on the 
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chromosome might provide additional insight into the evolutionary process. Our EM framework is 
nearly ideal for these types of studies, since the number of unknown parameters does not substan- 
tially increase the computational burden of the M-step, and the E-step is completely unaffected by 
the number of parameters. 

Interestingly, we attempted to use the generic nonlinear regression R function nlm to validate 
the MLEs obtained by our EM algorithm for the microsatellite evolution problem, starting at a 
variety of initial values, including the MLE found by our EM algorithm. This naive optimizer 
failed to converge in every case. We speculate that this is because the small numerical errors in the 
likelihood evaluation have similar order of magnitude as the curvature of the likelihood function 
near the maximum. Our EM aglorithms take advantage of analytic derivatives of the surrogate 
function instead of the likelihood, and hence are less susceptible to small errors in the numerical 
gradient. 



5 Conclusion 

Previous work on parameter estimation in BDPs almost exclusively confines itself to inference 
of birth and death rates under the simple linear model. To rectify this situation, we present a 
flexible and robust framework for deriving EM algorithms to estimate parameters in any general 
BDP, using discrete observations. We hope that this contribution encourages development of more 
sophisticated and realistic birth-death models in applied work, since researchers can now estimate 
parameters using more complicated rate structures, even when the data are observed at discrete 
times. 



Software 

A software implementation of the EM algorithms for general BDPs used in this paper is currently 
available from FWC (by request through the Editor to maintain reviewer anonymity) and will be 



deposited in CRAN (2011) before publication. 
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Figure 1: A sample path from a birth-death process (BDP) X{t). The process starts at state 
X{0) = 1 and is at state X(t) = 4 at time t. At right are schematic representations of the time spent 
in each state T^, the number of up steps Uk, and the number of down steps D^. These quantities 
are the sufficient statistics for estimators of rate parameters in general birth-death processes. 
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Figure 2: Effect of quasi-Newton acceleration on iterates of the expectation-maximization (EM) 
algorithm for a simple linear BDP with birth rate A and death rate jj,. Contour lines sketch the 
log-likelihood from N = 50 discrete samples. Iterates are shown with the "+" symbol. On the 
left, ordinary EM iterates converge very slowly in the neighborhood of the maximum, for a total 
of 36 iterations. On the right, EM iterates using quasi-Newton acceleration make large jumps and 
converge rapidly in 15 iterations. 
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Figure 3: Reversibility of the BDP implies that the evolutionary relationship between contemporary 
chimpanzees and the most recent common ancestor can be inverted. On the left, the most recent 
common ancestor of chimpanzees and humans lived at time T in the past. At a certain locus, 
chimpanzees have a microsatellite consisting of 2 repeats of the motif AAC, and at an orthologous 
locus, humans have 3 repeats of the motif. The number of repeats in the ancestor is unknown. On 
the right, using a probabilistic justification explained in the text, we may interpret the evolutionary 
relationship between chimpanzees and humans as unidirectional, while "integrating out" the number 
of repeats at the ancestral locus. 
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Model 



Endpoint- 
Rejection conditioned 



Time- 



Laplace- 



Quantity 


sampling 


simulation 


convolution 


convolut 


E(C/|Y) 


1.449 


0.741 


19.606 


0.084 


E{D\Y) 


1.375 


0.743 


21.224 


0.086 


IE(Tparticle Y) 


1.432 


0.636 


16.488 


0.087 




1.192 


0.697 


15.669 


0.085 


E{D\Y) 


1.324 


0.689 


21.058 


0.086 


lE(rparticiclY) 


1.319 


0.703 


14.961 


0.089 


E{U\Y) 


50.810 


0.162 


21.907 


0.102 


E(Z)|Y) 


56.957 


0.180 


20.851 


0.102 




50.764 


0.168 


21.623 


0.107 


E{U\Y) 


7.880 


* 


5.295 


0.059 


E{D\Y) 


8.886 


* 


2.749 


0.048 


ZkiN - k)kE{Tk\Y) 


8.456 


* 


4.269 


0.053 



Simple linear (2.4.1) 
A = 0.5, /i = 0.3 



Immigration (2.4.2) 
A = 0.5, u 
H = 0.3 



0.2 



Logistic (2.4.3) 
A = 0.5, a = 0.2 
H = 0.3 

SIS ( |2.4.4[ ) 

j3 = 0.5, 7 = 0.3 



Table 2: Compute times (seconds) to perform various E-steps for four different BDP models. 
We report text section numbers in which the models are described in parentheses. For each E- 
step, we consider several methods. In all cases, the Laplace method takes substantially less time. 
The endpoint-conditioned simulation method fails for the susceptible-infectious-susceptible (SIS) 
infectious disease model. 



37 



Model Parameter True Estimate SE 

Simple linear (A'" = 500) A 



(2.4.1) f. 



Immigration (A^ = 800) A 



(2.4.2) ly 
Logistic (N = 1500) A 



( |2.4.3[ ) a 

SIS (N = 1000) /3 
pAA\ 7 

GLM (A^ = 1000) 6>A,i 
( |2X5| ) ex,2 

e 



0.5 





5039 





0269 


0.2 





1981 





0254 


0.2 





2182 





0129 


0.1 





1016 





0213 


0.25 





2488 





0231 


0.3 





2917 





0035 


0.5 





4942 





0397 


0.05 





0456 





0633 


0.1 





1025 





0048 


2.0 


2 


1374 





0367 


0.25 





2585 





0393 


0.1 





1143 





0402 


0.2 





1973 





0457 


0.05 





0877 





0457 



Table 3: Point-estimates and their standard errors (SE) for simulated observations under various 
BDPs. We report the text section describing each of the models in parentheses. The method for 
generating the rates in the generalized linear model (GLM) BDP is described in the text. 
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Parameter Covariate Estimate SE 



01 

05 
06 



Intercept 



n = 2 
n > 3 

Pa 
Pc 
Pt 

birth 



-1.3105 
0.2854 
-1.5405 
0.2207 
-0.3822 
0.0477 
-0.0889 



0.1236 
0.0983 
0.1079 
0.1725 
0.0577 
0.0002 
0.0039 



a 



Table 4: Maximum likelihood estimates of parameters in the microsatellite model and their asymp- 
totic standard errors. The first three elements of correspond to the motif size r^, and the last three 
correspond to the motif nucleotide composition. The parameter a controls the difference between 
the birth and death rates. The ith microsatellite birth rate is then A = exp(a -|- z\9) and the death 
rate is /x = eyi\>{z\9). Estimated birth and death rates are higher for dinucleotide repeats than for 
mononucleotide repeats or microsatellites whose motifs have 3, 4, or 5 nucleotides. Mircrosatellites 
whose motif consists, for example, of A nucleotides have higher birth and death rates compared to 
G nucleotides. 
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